44 research outputs found

    Gap Amplification for Small-Set Expansion via Random Walks

    Get PDF
    In this work, we achieve gap amplification for the Small-Set Expansion problem. Specifically, we show that an instance of the Small-Set Expansion Problem with completeness ϵ\epsilon and soundness 12\frac{1}{2} is at least as difficult as Small-Set Expansion with completeness ϵ\epsilon and soundness f(ϵ)f(\epsilon), for any function f(ϵ)f(\epsilon) which grows faster than ϵ\sqrt{\epsilon}. We achieve this amplification via random walks -- our gadget is the graph with adjacency matrix corresponding to a random walk on the original graph. An interesting feature of our reduction is that unlike gap amplification via parallel repetition, the size of the instances (number of vertices) produced by the reduction remains the same

    Braess's paradox for the spectral gap in random graphs and delocalization of eigenvectors

    Get PDF
    We study how the spectral gap of the normalized Laplacian of a random graph changes when an edge is added to or removed from the graph. There are known examples of graphs where, perhaps counterintuitively, adding an edge can decrease the spectral gap, a phenomenon that is analogous to Braess's paradox in traffic networks. We show that this is often the case in random graphs in a strong sense. More precisely, we show that for typical instances of Erd\H{o}s-R\'enyi random graphs G(n,p)G(n,p) with constant edge density p∈(0,1)p \in (0,1), the addition of a random edge will decrease the spectral gap with positive probability, strictly bounded away from zero. To do this, we prove a new delocalization result for eigenvectors of the Laplacian of G(n,p)G(n,p), which might be of independent interest.Comment: Version 2, minor change

    Global and Local Information in Clustering Labeled Block Models

    Get PDF
    The stochastic block model is a classical cluster-exhibiting random graph model that has been widely studied in statistics, physics and computer science. In its simplest form, the model is a random graph with two equal-sized clusters, with intra-cluster edge probability p, and inter-cluster edge probability q. We focus on the sparse case, i.e., p, q = O(1/n), which is practically more relevant and also mathematically more challenging. A conjecture of Decelle, Krzakala, Moore and Zdeborova, based on ideas from statistical physics, predicted a specific threshold for clustering. The negative direction of the conjecture was proved by Mossel, Neeman and Sly (2012), and more recently the positive direction was proven independently by Massoulie and Mossel, Neeman, and Sly. In many real network clustering problems, nodes contain information as well. We study the interplay between node and network information in clustering by studying a labeled block model, where in addition to the edge information, the true cluster labels of a small fraction of the nodes are revealed. In the case of two clusters, we show that below the threshold, a small amount of node information does not affect recovery. On the other hand, we show that for any small amount of information efficient local clustering is achievable as long as the number of clusters is sufficiently large (as a function of the amount of revealed information).Comment: 24 pages, 2 figures. A short abstract describing these results will appear in proceedings of RANDOM 201

    Spectral clustering in the Gaussian mixture block model

    Full text link
    Gaussian mixture block models are distributions over graphs that strive to model modern networks: to generate a graph from such a model, we associate each vertex ii with a latent feature vector ui∈Rdu_i \in \mathbb{R}^d sampled from a mixture of Gaussians, and we add edge (i,j)(i,j) if and only if the feature vectors are sufficiently similar, in that ⟨ui,uj⟩≥τ\langle u_i,u_j \rangle \ge \tau for a pre-specified threshold τ\tau. The different components of the Gaussian mixture represent the fact that there may be different types of nodes with different distributions over features -- for example, in a social network each component represents the different attributes of a distinct community. Natural algorithmic tasks associated with these networks are embedding (recovering the latent feature vectors) and clustering (grouping nodes by their mixture component). In this paper we initiate the study of clustering and embedding graphs sampled from high-dimensional Gaussian mixture block models, where the dimension of the latent feature vectors d→∞d\to \infty as the size of the network n→∞n \to \infty. This high-dimensional setting is most appropriate in the context of modern networks, in which we think of the latent feature space as being high-dimensional. We analyze the performance of canonical spectral clustering and embedding algorithms for such graphs in the case of 2-component spherical Gaussian mixtures, and begin to sketch out the information-computation landscape for clustering and embedding in these models.Comment: 41 page

    Computational Barriers to Estimation from Low-Degree Polynomials

    Full text link
    One fundamental goal of high-dimensional statistics is to detect or recover structure from noisy data. In many cases, the data can be faithfully modeled by a planted structure (such as a low-rank matrix) perturbed by random noise. But even for these simple models, the computational complexity of estimation is sometimes poorly understood. A growing body of work studies low-degree polynomials as a proxy for computational complexity: it has been demonstrated in various settings that low-degree polynomials of the data can match the statistical performance of the best known polynomial-time algorithms for detection. While prior work has studied the power of low-degree polynomials for the task of detecting the presence of hidden structures, it has failed to address the estimation problem in settings where detection is qualitatively easier than estimation. In this work, we extend the method of low-degree polynomials to address problems of estimation and recovery. For a large class of "signal plus noise" problems, we give a user-friendly lower bound for the best possible mean squared error achievable by any degree-D polynomial. To our knowledge, this is the first instance in which the low-degree polynomial method can establish low-degree hardness of recovery problems where the associated detection problem is easy. As applications, we give a tight characterization of the low-degree minimum mean squared error for the planted submatrix and planted dense subgraph problems, resolving (in the low-degree framework) open problems about the computational complexity of recovery in both cases.Comment: 38 page
    corecore